Emotion Recognition using Acoustic and Lexical Features
نویسندگان
چکیده
In this paper we present an innovative approach for utterance-level emotion recognition by fusing acoustic features with lexical features extracted from automatic speech recognition (ASR) output. The acoustic features are generated by combining: (1) a novel set of features that are derived from segmental Mel Frequency Cepstral Coefficients (MFCC) scored against emotion-dependent Gaussian mixture models, and (2) statistical functionals of low-level feature descriptors such as intensity, fundamental frequency, jitter, shimmer, etc. These acoustic features are fused with two types of lexical features extracted from the ASR output: (1) presence/absence of word stems, and (2) bag-of-words sentiment categories. The combined feature set is used to train support vector machines (SVM) for emotion classification. We demonstrate the efficacy of our approach by performing fourway emotion recognition on the University of Southern California’s Interactive Emotional Motion Capture (USC-IEMOCAP) corpus. Our experiments show that the fusion of acoustic and lexical features delivers an emotion recognition accuracy of 65.7%, outperforming the previously reported best results on this challenging dataset.
منابع مشابه
Recognizing emotions in dialogue with disfluences and non- verbal vocalisations
We investigate the usefulness of DISfluencies and Non-verbal Vocalisations (DIS-NV) for recognizing human emotions in dialogues. The proposed features measure filled pauses, fillers, stutters, laughter, and breath in utterances. The predictiveness of DISNV features is compared with lexical features and state-of-the-art low-level acoustic features. Our experimental results show that using DIS-NV...
متن کاملRecognizing Emotions in Dialogues with Disfluencies and Non-verbal Vocalisations
We investigate the usefulness of DISfluencies and Non-verbal Vocalisations (DIS-NV) for recognizing human emotions in dialogues. The proposed features measure filled pauses, fillers, stutters, laughter, and breath in utterances. The predictiveness of DISNV features is compared with lexical features and state-of-the-art low-level acoustic features. Our experimental results show that using DIS-NV...
متن کاملCompensating for speaker or lexical variabilities in speech for emotion recognition
Affect recognition is a crucial requirement for future human machine interfaces to effectively respond to nonverbal behaviors of the user. Speech emotion recognition systems analyze acoustic features to deduce the speaker’s emotional state. However, human voice conveys a mixture of information including speaker, lexical, cultural, physiological and emotional traits. The presence of these commun...
متن کاملRobust Recognition of Emotion from Speech
This paper presents robust recognition of selected emotions from salient spoken words. The prosodic and acoustic features were used to extract the intonation patterns and correlates of emotion from speech samples in order to develop and evaluate models of emotion. The computed features are projected using a combination of linear projection techniques for compact and clustered representation of ...
متن کاملEmotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012